Entropy of complex relevant components of Boolean networks 
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Boolean network models of strongly connected modules are capable of capturing the high regu- 
latory complexity of many biological gene regulatory circuits. We study numerically the previously 
introduced basin entropy, a parameter for the dynamical uncertainty or information storage capacity 
of a network as well as the average transient time in random relevant components as a function of 
their connectivity. We also demonstrate that basin entropy can be estimated from time-series data 
and is therefore also applicable to non-deterministic networks models. 



I. INTRODUCTION 

Random Boolean networks are often studied as generic 
models of gene regulatory networks [2| P| ■ The ensemble 
approach to gene regulation, a term coined by Kauff- 
man, aims at studying well-defined ensembles of model 
networks, the statistical features of which match those of 
real cells and organisms [|| . Ensembles of special biolog- 
ical interest are critical random Boolean networks, which 
lie at the boundary between a frozen and a chaotic phase 
13 @ • I n the frozen phase, a perturbation to one node 
propagates on average to less than one other node during 
one time step. In the chaotic phase, the difference be- 
tween two almost identical states increases exponentially 
fast, since a perturbation propagates on average to more 
than one node during one time step Q ■ Critical Boolean 
networks balance robustness against perturbations with 
complex asymptotic attractor dynamics. 

Since Boolean networks are deterministic systems, 
they eventually settle into periodic attractors. Regard- 
ing the behavior in this asymptotic limit, nodes can be 
classified into three groups. Nodes that are frozen to the 
same value on every attractor form the frozen core of a 
network These nodes give a constant input to other 
nodes and are otherwise irrelevant. Nodes that are not 
frozen but have no outputs, or only outputs to other ir- 
relevant nodes, are also classified as irrelevant. Although 
they might fluctuate, they do not influence the number 
and periods of attractors. Finally, the relevant nodes are 
those non-frozen nodes that influence their own state over 
directed loops. The recognition of the relevant nodes as 
the only elements influencing the asymptotic dynamics 
was an important step in understanding the dynamics of 
Boolean networks @ [l(| • 

In a biological context, an attractor is associated with 
a characteristic dynamically stable pattern of gene ex- 
pression, which may represent a particular fate of the 
cell. The weight of each such attractor can be defined 
as the probability for a random state in the state space 
of the network to flow into this attractor. Based on the 
state space partition into attractors of different weights, 
we recently introduced a new network parameter, called 
the basin entropy (hereafter, simply entropy), which mea- 
sures the uncertainty of the future behavior of a random 
state. This entropy was shown to increase with system 



size in critical network ensembles, whereas for ensembles 
in the ordered phase and in highly chaotic networks, it 
approaches a finite value [31] . From an informational pro- 
cessing perspective, this means that the complexity of a 
network increases only with its system size if it operates 
near the critical regime. 

An intuitive understanding of this growing complex- 
ity are networks whose relevant nodes are modularly 
organized and whose complexity increases if new mod- 
ules accumulate. In living systems, transcriptional reg- 
ulation is often highly modular [l3[[l2j]. Of special in- 
terest are complex relevant components, which consist 
of relevant nodes containing more than one relevant in- 
put/output Boolean network models for several bi- 
ological gene regulatory circuits have been constructed 
and were shown to reproduce experimentally observed re- 
sults [l6| [17[ [15| [lj] . These often highly-connected sub- 
networks can be viewed as biological examples of such 
complex relevant components. 

In this work, we first numerically study the entropy 
of complex relevant components as a function of their 
connectivity. We show that the probability of such a 
component to freeze increases with growing connectiv- 
ity and that its entropy decreases. Additionally, we also 
study the average transient time of a random state un- 
til it falls into its attractor. The calculation of dynamic 
network parameters, such as the basin entropy, requires 
the knowledge of the underlying network functions. This 
often limits the applicability of such parameters. We 
demonstrate that the entropy of a network can also be 
estimated from time series data by the application of clus- 
tering techniques. This broadens the applicability of this 
dynamic network parameter to models whose functions 
are not necessarily known or that are not deterministic. 
In order to illustrate our results, we will use a Boolean 
network model for the mammalian cell cycle as presented 
in 



II. BOOLEAN NETWORKS 

In a Boolean network every gene is identified by a node 
i, whose state Xi € {0,1} indicates whether the gene is 
switched on or off. Each node i receives input from k% 
other nodes and its state at the next time step t + 1 is a 
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Boolean function /j of the states of the nodes it is depend- 
ing on: Xi(t + 1) = fi{x ix (t) , ...,x iki (t)). A network is 
entirely defined by its directed connectivity graph G and 
the Boolean functions F = {/i, /„} assigned to every 
node. The state of a network x(i) = (x\ (t) , ...,x n (t)) 
contains the values of all n nodes in the network at a 
given time point t. A Boolean network thus defines a 
deterministic transition matrix T on its state space. A 
random state x £ {0, l} n that is propagated through the 
network as 

x(t + l) = F(x(i)) (1) 

generates a time series or trajectory through the state 
space that finally intersects with itself. The states that 
are then revisited infinitely often define the attractor p, 
with the number of states on the attractor equal to l(p), 
also called the attractor size. Transient states that lead 
into an attractor form its basin of attraction. The sum of 
all attractor states and basin states of a certain attrac- 
tor normalized by the size of the entire state space (2 ra ) 
define the weight w p of that attractor. The weight of an 
attractor is the probability that a randomly chosen state 
will flow into this very attractor. The entropy h of the 
probability distribution defined by w p is called the basin 
entropy: 

h = —*^^Wp\nw p . (2) 

p 

This measure captures the uncertainty of the future dy- 
namic behavior of the system started in a random state. 
The scaling of this parameter relative to system size was 
discussed in [3l| for ensembles of different dynamical 
regimes. 

The sensitivity of a node i specifies the number of its 
input arguments that, when toggled, will result in a flip 
in the value of that node, averaged over all input com- 
binations. A Boolean function /$ with ki input variables 
that takes on the value 1 for any one of its possible 2 ki 
input vectors with probability pi has the expected sensi- 
tivity [18(: 

Si = 2fc l p,(I -p^ (3) 

The network sensitivity s is the average sensitivity of 
all its nodes and indicates to how many nodes a pertur- 
bation to a single node is, on average, propagated. The 
network sensitivity is an order parameter of a network 
ensemble that divides random Boolean networks with an 
expected sensitivity of s < 1 into the ordered phase and 
with s > 1 into the chaotic phase. Random networks that 
propagate a perturbation, on average, to one other node 
(s = 1), are called critical. Classical Kauffman networks 
with a fixed indegree k = 2 and p — 0.5, are prototypes 
of critical Boolean network ensembles, though it should 
be noted that this definition of criticality is independent 
of the actual indegree distribution. 




FIG. 1: Random complex component of excess e = 0, 1, 6 
from left to right. 



The attractor dynamics of a Boolean network are en- 
tirely determined by its relevant nodes. The scaling of 
these nodes was first discussed in [l9[ and derived for 
a general class of critical network ensembles by Drossel, 
Kaufman and Mihaljev [2(j [ll| They presented a stochas- 
tic process that removes frozen nodes and nonfrozen ir- 
relevant nodes from a network and ends when there are 
only potentially relevant nodes left. We call these nodes 
'potentially relevant', as some of them may be part of self- 
freezing loops. We will discuss this phenomenon below 
in more detail. The number of these potentially relevant 
nodes n r scales in all critical networks of size n and aver- 
age connectivity k > I as n r ~ n 1 / 3 . In the limit of large 
n, almost all these relevant nodes depend on only one rel- 
evant node; the proportion of relevant nodes depending 
on more than one relevant input approaches zero; and 
the number of nodes depending on more than two rel- 
evant inputs is almost surely zero. These results agree 
with structural findings of random graphs at the point 
of phase transition, where the number of nodes in com- 
plex components scales with n 1 / 3 and each such complex 
component is almost surely topologically equivalent to a 
3-regular multigraph [22| . 

III. RANDOM COMPLEX RELEVANT 
COMPONENTS 

A set of relevant nodes that eventually influence each 
other's state form a relevant component. We define the 
excess e of the connection digraph of a component in 
analogy to the graph theoretic terminology as the differ- 
ence between the number of arrows a between the nodes 
and the number of nodes n: 

e = a — ii. (4) 

The topology of the simplest relevant component is a 
loop of nodes. The excess of this component is e = 0. 
If we randomly add a new link to this loop we have an 
intersected loop of the same size with excess e = 1 . One 
node now depends on two nodes and one node influences 
two nodes 1 . 



In the Boolean network literature only relevant components with 
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By randomly adding further edges, we may generate 
components of arbitrary excess (see Fig. [T|). 

In a simple relevant loop only the Boolean 'copy' ( 
Xi(t + 1) = Xj(t) ) and 'negation' ( Xi{t + 1) = x](t) ) 
functions can be used. A perturbation in a loop node is 
always propagated to a single successor node and there- 
fore such components have sensitivity s = 1. If another 
edge is added, the Boolean function of the node receiving 
input from two relevant inputs has to be changed. Gen- 
erally, whenever the indegree of a node increases from 
k to k + 1 variables, a new Boolean updating function 
for k + 1 variables has to be assigned. By adjusting the 
bias p (see eq. ([3])), we can randomly generate a Boolean 
function of k + 1 variables so that its expected sensitiv- 
ity is E (s) = 1. If any of the k + 1 variables happen to 
be fictitious (i.e., toggling their value has no effect on the 
output), we can draw again in order to guarantee that all 
added edges are irreducible. This process thus generates 
random components of arbitrary excess, operating in the 
critical or slightly supracritical regime. 

IV. ENTROPY OF RELEVANT COMPONENTS 

The only relevant component whose entropy we can 
discuss analytically is the simple loop. In simple loops 
all states are attractor states and it is therefore sufficient 
to know the length l(p) of an attractor in order to deter- 
mine its weight, w p = l(p)/2 n . Regarding the attractor 
dynamics, we can substitute an even number of 'nega- 
tion' functions in a loop by 'copy' functions, so that it 
is sufficient to discuss loops with an even number or an 
odd number of 'negations'. In even loops, the attrac- 
tor states that only differ by a rotation of their values 
belong to the same attractor and attractors can there- 
fore be viewed as an equivalence class under rotation. In 
combinatorics such an equivalence class is also known as 
a binary necklace of length n. The number of attractors 
of a simple even loop is calculated as follows [H[ : 

d\n 

where <fi(m) is Euler's totient function, which is defined 
as the number of positive integers < m that are relatively 
prime to m (i.e., do not contain any factor in common 
with m). The sum is taken over all dividers of n. If n 
is prime, the number of attractors of length n is simply 
(2™ - 2) /n. 

In odd loops, a state x and its negation x are always 
part of the same attractor and the number of attractors 
can be calculated with the formula: 



an excess e > are called complex components. This might cause 
confusion for readers familiar with the graph theoretic terminol- 
ogy, where a component is called complex if its excess is e > 0. 



NOdd( v = f h E„|„ </>WW d + |2t, if n is even 
A [H> \ h Edln ^/ d ) 2d + 2^ , if n is odd 

For n prime, the number of attractors of length 2n is 
(2™ — 1) /2n. For large even (odd) loops, most attractors 
are of length n (2n) and the entropy can be approxi- 
mated by considering only those attractors that are of 
maximal weight, h(n) rs nln2 — O(lnn). Therefore, the 
entropy of simple loops scales linearly with its size under 
synchronous updating. 

In terms of the entropy, the key difference between 
complex components with excess e > 1 compared to sim- 
ple loops is that the attractor length and weight are no 
longer correlated. In complex components, attractors of 
length one may have even higher weights than larger at- 
tractors in the same network. The mean number and 
length of attractors of a random component are insuffi- 
cient in describing its dynamic complexity and we choose 
to study the entropy as a function of increasing excess 2 . 

When we start increasing the excess of our compo- 
nent from e — 0, two qualitatively different things may 
happen: either all nodes stay relevant and only the at- 
tractor dynamics change, or parts of the component or 
perhaps even the entire component freezes. This spe- 
cial case of self- freezing loops was first discussed in [271 ]. 
The simplest case of a self-freezing component are two 
nodes that canalize each other to their majority bits, e.g. 
fi = x\ V X2, $2 — %i V 3>2~. Such components are clearly 
not relevant components, but part of the frozen core of a 
network. In Fig. [2] the probability that a critical compo- 
nent of n = 10 nodes freezes is shown for increasing excess 
e. The average was taken over more than 26,000 net- 
work instances for every excess e. We obtained the same 
qualitative progress for component sizes up to n = 18 3 : 
The addition of the first few edges to a loop strongly in- 
creases the probability to freeze, whereas the probability 
to freeze increases slower in components of already high 
excess. 

If a component becomes frozen, or only has a single 
attractor, it has entropy h — and we will not consider 
it as a relevant component. Fig[3] shows the decline in 
the average entropy (h) of the relevant n = 10 node com- 
ponents as a function of their excess. The entropy drops 
sharply for the first few additional edges and decreases 
slower for e > 10. For all studied component sizes n up 
to 18, the average entropy falls below 1 for e = 10 and 



2 For a thorough and mainly analytical discussion of the mean 
number and length of attractors in relevant components of excess 
e = 1, see [H] 

3 The computational time to calculate the entropy increases ex- 
ponentially with system size. We therefore chose larger incre- 
ments for the excess in larger components: n = 12, 14, 16, 18, 
e = 1,2,..., 10,15, 20, ...,90. Over 2000 network instances have 
been simulated for every ensemble. 
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FIG. 2: The probability of a n — 10 node component to freeze 
increases roughly logarithmically with increasing excess e. 
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FIG. 3: Average entropy (h) of a complex relevant component 
with increasing excess e. 



continues to decrease slower thereafter. 

As soon as additional edges are introduced into the 
loop, the average number of attractor states drops sub- 
stantially and the majority of states become transient 
states. The average number of steps that are required 
to reach an attractor from a random state in the state 
space is defined as the average transient time (x) of a 
network. In a simple loop, where all states are attrac- 
tor states, the transient time is zero. If we increase the 
excess, the transient time first sharply increases, peaks 
around an excess of e = 2, (x)™^ (e = 3) = 10 and be- 
gins to decrease thereafter as shown in Fig. 2] We obtain 
qualitatively the same behavior for the transient time in 
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FIG. 4: The average transient time decreases after a peak 
around e ~ 2 with increasing excess. 



1000^ 




FIG. 5: Average transient time (x) in random network en- 
sembles of increasing sensitivity s. 



components of sizes up to n = 18, ((x)™=i8 (e = 3) « 28, 
(X) n=18 (e=100)«3). 

We also studied the average transient time as a func- 
tion of the average sensitivity (Fig. [5]). The average 
transient time starts to grow rapidly, soon after the en- 
sembles enter the chaotic regime (s > 1) and scales with 
the system size in highly chaotic networks. 

Compared to the average transient times in networks 
of higher sensitivity s, even the maximal transient times 
in the critical relevant components (Fig. [5]) are small. 
The updating functions of nodes of the critical compo- 
nent that depend on more than one relevant input are 
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more likely to be canalizing 4 because of the stronger bias 
p that was used in their generation process [3(| ■ Canaliz- 
ing functions are found frequently as control rules govern- 
ing the transcription in eukaryotic genes [28]. Dynamic 
properties of random Boolean networks with canalizing 
functions have also been studied in [l|,[23|- A higher pro- 
portion of canalizing functions leads characteristically to 
short attractors and more robust dynamics. 
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V. BOOLEAN NETWORKS MODELING 
BIOLOGICAL CIRCUITS 



As a showcase model for a biological gene regulatory 
circuit, we now analyze the Boolean network of the mam- 
malian cell cycle as presented in [15(. This n = 10 node 
network simulates the states of cell cycle genes that reg- 
ulate the process of cell division and its quiescent Go 
phase. It can be viewed as a biological example of a 
complex relevant component of highly connected nodes 
(for a more thorough biological discussion, see (Hfl). The 
Boolean functions and attractor states of this network 
are shown in the tables below. The value on the left 
hand side of the equations corresponds to the value at 
time t + 1, Xi(t+ 1) = f{x-i 1 (t), Xi k . (t)), with the sym- 
bols + and • representing the OR and AND operations, 
respectively: 



Gene 


Boolean function 


CycD 


x\ = xi 


Rb 


x 2 = {x A Xr + X 6 )XlX W 


E2F 


X3 = (X5 + X 6 )X2X W 


CycE 


xa = x^xi 


CycA 


X5 = (x 3 + X 5 ) (x 5 + X S ) X2X7 


P 27 


Xq = (X4X 5 + X e Xi + X e X 5 ) XiXio 


Cdc20 


x 7 = x w 


Cdhl 


x 8 = x 5 x 10 +x 7 + x 6 x w 


UbcHIO 


x 9 = Xs + x 8 x 9 (x 7 + x 6 + x w ) 


CycB 


Xw = X7X$ 



4 A function is canalizing for the value cr; = {0, 1} of variable i 
if this value can determine the function value regardless of the 
values of the other input variables. 



The attractor of length seven represents the different 
steps of the cell cycle phases, Gi,S,G2 and M, whereas 
the fixed point attractor represents the Go phase. Both 
attractors have the same weight w — 0.5, which yields 
an entropy of h — In 2 ss 0.69. If we average over the 
sensitivities of the single nodes, we obtain a network sen- 
sitivity of s — 1.04 which lies in the range of the expected 
average sensitivity for a relevant component of a critical 
network. The average transient time of this network is 
(x) = 4.8, which is on the order of a random relevant 
component with the same excess (e = 24). 

So far, other detailed deterministic Boolean models 
have only been presented for a few other gene regulatory 
circuits. A prominent example among those is a model 
for the segment polarity gene network, which governs 
the embryonic pattern formation during certain stages 
of the developmental process in the fruit fly Drosophila 
melanogaster [161 ]. This Boolean network models the in- 
teraction between eleven genes and its products and de- 
fines certain fixed-point (single-state) attractors that can 
be identified as stable gene expression patterns in wild- 
type embryos. For this model, we also find the dynamic 
network parameters to lie within the range of critical 
complex components: for the network sensitivity we ob- 
tain s = 1.02, the entropy is h = 2.17 and the transient 
time (x) — 3.6. 

Generally, the identification of a single deterministic 
logical function for a gene is often difficult for experi- 
mental reasons [l7j . or determinism might not even be 
a desired feature of the modeling approach itself. For 
example, probabilistic Boolean networks consider more 
than just one Boolean function as possible updating rules 
for a gene [24]. Also, asynchronous up dating schemes 
lead to non-deterministic dynamics [3a]. We therefore 
conclude with a section in which we sketch an approach 
that enables us to extend the concept of basin entropy to 
non-deterministic models. 
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VI. ENTROPY ESTIMATED FROM 
TIME-SERIES DATA 

In an unperturbed Boolean network a trajectory tl 
started from a random point in the state space will fina 
be caught in a single attractor. If we allow small rand< 
perturbations in the updating procedure, the trajectt 
will jump out of its attractor from time to time and rr 
settle in another attractor. The deterministic dyna 
ics of the unperturbed network give way to a stochas 
(and ergodic) Markov process with transition probal 

ities pij = P(x f = i|x t _i = i), such that Y%=iPV = 
[36| . The transition probabilities can be arranged in an 
ordered fashion in a stochastic state transition matrix, 

/ Pn P12 ■ ■ ■ \ 

P = \ P21 P22 ■ ■ ■ . (7) 



Let 7r (0) be the vector of initial state probabilities at 
time t = 0. We may calculate the state probability dis- 
tribution 7r (to) after to steps: 

7r(m) =7r(0)P m . (8) 

We may further sum up the probabilities of states, lead- 
ing to the same attractor, to get a probability distribu- 
tion v m for the attractors after to time steps. The steady- 
state probabilities (to — > oo) for attractors in Boolean 
networks with perturbations were studied in (37j . 

Let us consider the following two-node network defined 
by the Boolean functions x\(t + 1) = Xi(t) + X2(t) and 
x 2 (t + 1) — xi(t) ■ x~2~{t) + Xi(t) ■ x 2 (t) to illustrate the 
difference between the weight distribution of the deter- 
ministic case and the attractor probability distribution in 
the perturbed case. When we use a perturbation proba- 
bility of q = J , r(xi — > Ti) =0.1 for switching the value 
of a node after calculating the successor of a state, the 
deterministic transition matrix 




FIG. 6: (a): The deviation a between the weight distribu- 
tion w and the estimated basin probability distribution v de- 
creases for longer time series m (n = 10 and q = 0.01 fixed), 
(b): The deviation a becomes smaller with increasing network 
size n (m = 10, 000 and q = 0.01 fixed). 



This Boolean network divides its state space into two 
different basins of attraction: the first one consists of its 
fixed point attractor (00) while the second one contains 
the transient state (01) that is flowing into the attractor 
(10) (11)- Thus, the weight distribution is w\ = 0.25 
and W2 = 0.75. If we solve the steady state equation 
7T = 7rP, we obtain 7T( o) = 0.201, 7T(oi) = 0.0598, Trno) = 
0.3618, 7T(n) = 0.3775. We may sum up the probabilities 
of states contributing to the same attractor basin to get 
a probability distribution v of the basins, v\ = 0.201 and 
t>2 = 0.799. If the history of a trajectory is unknown, 
this distribution describes the probability of the network 
operating in a certain basin. It can be estimated with 
arbitrary precision by sampling the states of a time series. 
The perturbation probability q clearly affects v\ for large 
g, the network rules (connections, updating functions) 
lose their importance and the time series become random. 

For our described two-node example network, the basin 
weight distribution w and the basin probability distribu- 
tion v differ. We therefore studied how increasing net- 
work size affects the average deviation, 



( 1 





V o 



1 
1 

o / 



is replaced by the stochastic transition matrix 



( 0.81 0.09 0.09 0.01 \ 

0.01 0.09 0.09 0.81 

0.01 0.09 0.09 0.81 

\ 0.09 0.01 0.81 0.09 / 



where the states along the rows and columns are arranged 
in the usual lexicographic order (00, 01, 10, 11). 



(9) 



between the basin probability distribution v and the 
weight distribution w in different network ensembles. In 
Fig. EJa) the behavior of a is shown when increasingly 
long time series with q = 0.01 are used to estimate the 
basin probability distribution of critical n — 10 network 
ensembles. Fig. |6jb) shows that the mean deviation 
a decreases for increasing network sizes (q = 0.01 and 
to = 10,000). Therefore, especially in larger networks, 
the basin weight distribution may be well estimated by 
the basin probability distribution. For all network en- 
sembles the average was taken over more than 10, 000 
network instances. 
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The assignment of a state in the time series to its at- 
tractor, if the underlying network rules are unknown, is a 
challenging classification problem. Two states occurring 
in the time series with comparable frequency may either 
belong to the same attractor or to two different attractors 
that just happen to have a similar weight or probability. 
In order to solve this problem, we have to make a second 
assumption: states that belong to the same attractor are 
more likely to occur in temporal proximity. Instead of 
looking at a single state, we consider all states in a time 
window of a certain length r. The size of the time window 
t generally has to be estimated from the time series data 
[38| . In a perturbed trajectory, generated from a Boolean 
network, r should be on the order of the expected attrac- 
tor lengths. In a model based on a biological circuit, the 
choice for the expected length may also be motivated by 
'biological intuition'. 

The classification of these time profiles into several at- 
tractors is a clustering problem, where the number of 
clusters is not known. Many algorithms to estimate the 
optimal number of clusters in a data set have been devel- 
oped and extensively studied. Generally, more free pa- 
rameters (clusters) enable to further minimize the error 
function on which the cluster algorithm is based. This 
better 'fit' is then penalized by a term growing with the 
number of free parameters. Based on this trade-off crite- 
rion, an 'optimal' clustering model can be found. When 
dealing with biological data, a range for the number of 
expected clusters (e.g. different cell states) can also be 
provided by the experimentalist. It is not our intention 
here to discuss the challenges of clustering and we refer 
the interested reader to the extensive literature in this 
field [HI, [HI], [34j]. However, for illustrational purposes, 
we sketch the analysis of the already introduced network 
of the mammalian cell cycle by a perturbed time series 
and clustering. 

We generated a time series of m = 10, 000 successor 
states from the Boolean model of the mammalian cell cy- 
cle. The value of every node was flipped with probability 
q = 0.01 after calculating the successor state. Profiles 
were generated by adding up the values of a node during 
the time window r: 

t 

Ci (t)= Yl x dt') (io) 

t' = t-T 

with Ci 6 {0, t + 1}. Different values of r have been 
tested: r = 4, . . . , 10. The distance between two profiles 
c = [c\, . . . , c„) and c' = (c' 1; . . . , c' n ) was measured by 
the city block (Li) distance: 

n 

d(c,c')= J>-c$|. (11) 

1=1 

The profiles were then clustered by the fc-means algo- 
rithm [331 ] . In order to determine the optimal number 
of clusters, the Bayesian information criterion (BIC) and 
Akaike information criterion (AIC) have been used and 



yielded an optimal number of two clusters for all used r. 
This correctly corresponds to the two attractors of the 
underlying Boolean network, the fixed-point attractor of 
the Go phase and the attractor of length seven of the 
cell cycle. The classification of the time series into two 
attractors yields a probability distribution v, whose en- 
tropy h = 0.691 is close to the 'true' network's entropy, 
h ~ In 2. This example demonstrates how the attractors 
and their weight distribution, a dynamical property of 
the network, can be derived from a time series using a 
straightforward clustering approach that does not require 
knowledge about the underlying network rules. 



VII. DISCUSSION 

We studied the average entropy and transient time of 
random complex relevant components near criticality as 
a function of their excess. This was motivated in part by 
new analytical results on the degree distribution of rele- 
vant nodes in critical network ensembles [2l| . In random 
graphs of such given degree distributions, most nodes are 
organized with high probability as a single giant compo- 
nent [H|],[26j]. The regulatory elements in gene networks, 
on the other hand, seem to be organized in a more modu- 
lar manner [l3| , [l2j ■ This raises the question of whether 
(ordinary) critical random Boolean networks that have 
been successfully used as models for the study of gene 
regulatory dynamics still lack characteristic properties of 
their biological archetypes. 

When we consider, for example, the excess of a net- 
work as a fixed constraint, a modular organization of the 
interacting nodes will yield a higher basin entropy and 
a shorter average transient time 39 . One might ask if a 
maximization of the basin entropy or a minimization of 
the transient time are desirable features in biological net- 
works. A short transient time might, for instance, speed 
up the process of settling down to a certain cellular state 
(corresponding to an attractor) whereas a high entropy 
would minimize the required number of genes to per- 
form a classification/decision task. Thus, the ability to 
rapidly respond to environmental cues by switching to a 
particular functional cellular state as well as the economy 
with which cellular information processing and decision- 
making can be carried out may be evolutionarily selected 
features of biological networks. 

With the declining costs of microarray and other high- 
throughput technologies, time series data will be increas- 
ingly available from biological experiments, enabling net- 



A critical relevant component of n = 10 nodes and excess e = 10 
has an average entropy of (h) 0.6. If two such components 
with a combined entropy of (h) rj 1.2 merge into a n = 20 
node component of excess e = 20 and are randomly rewired, the 
entropy decreases to an average value of (h) ~ 0.7. The average 
transient time, on the other hand, increases from \ ~ 5.5 to 
X~12. 
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work parameters such as basin entropy and transient time 
to be studied in a biological context. Our approach to 
estimate entropy from time series data sketches one pos- 
sible way how that might be accomplished. 
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